Global musical diversity is largely independent of linguistic and genetic histories

Music is a universal yet diverse cultural trait transmitted between generations. The extent to which global musical diversity traces cultural and demographic history, however, is unresolved. Using a global musical dataset of 5242 songs from 719 societies, we identify five axes of musical diversity and show that music contains geographical and historical structures analogous to linguistic and genetic diversity. After creating a matched dataset of musical, genetic, and linguistic data spanning 121 societies containing 981 songs, 1296 individual genetic profiles, and 121 languages, we show that global musical similarities are only weakly and inconsistently related to linguistic or genetic histories, with some regional exceptions such as within Southeast Asia and sub-Saharan Africa. Our results suggest that global musical traditions are largely distinct from some non-musical aspects of human history.


Line
Phrase length 1 -5 (long phrases -short phrases) All Cantometrics Lines were originally recorded on a scale with a maximum value of 13.Each Cantometrics variable had a different number of possible responses, meaning this scale was not spaced equally across variables.We rescale all variables in Cantometrics to be between 0 and 1.To do this, each variable is rescaled from the 13-point scale to a linearly increasing scale.For example: Line 5 (Tonal blend of the vocal group) had 5 codes 1, 4, 7, 10, 13, which are respectively coded to 1, 2, 3, 4, 5.Then, the linear scale is standardized using the following formula: Subtracting 1 from the code and maximum value means the rescaled variables start at 0.

S1.4 Rescaling
We reverse the codings of several existing Cantometric variables so that all variables align high values with a more frequent occurrence of what the variable measures.These are listed in table S2.Codes are reversed by subtracting the standardized scores from 1.

S1.6 Pairing Cantometrics to Genetics and Language data
To analyze the patterns of between-group differences in musical style, Cantometric societies are paired to both a language and genetic sample.Languages are identified using Glottocodes 2 .The Glottocode is used to link the Cantometric society to a genetic population from the database GeLaTo 3 , a global genetic diversity panel annotated for linguistic affiliation, and from other published genetic data (see Supplementary Data S5 for a list of genetic publications used).The Cantometric society is associated with the genetic population either through a perfect Glottocode match, through alternative population name matches, or if no identical match is found through a closely related language.Geographic proximity between the location of a Cantometric society and the genetic population was also considered.

S1.7 Cantometrics Coding Procedure and Reliability
The Cantometric codings in the Global Jukebox have undergone an extensive quality control procedure and testing reliability and accuracy documented in a previous publication 1 .The following quote summarizes the results of these quality controls (see Fig. S8 and Table S12 in Wood et al. for a detailed breakdown of reliability statistics for each variable under different coder conditions and combinations): "Overall, our analyses suggest that both coding reliability (mean κ = 0.54; Fig. S8 and Table S12) and accuracy (approximately 0.4-1% rate of unambiguous coding/data entry errors; Fig. S9) are at acceptable levels on average.However, there was also substantial variation in reliability across variables.Some variables showed near-perfect consensus: for example Line    The construction of the latent variables is determined using a combination of principal component analysis, and an existing analysis of the Cantometrics dataset.First, we implemented a principal component analysis using all Cantometrics variables, replicating the process used in early Cantometrics work, although expanding from the initial subset of ~1,800 songs to 5,242 songs 5 .Through this process we identify a subset of Cantometrics variables which contain dependencies for songs that involve either solo singers or solo instrumentalists (table S4 & S5).For example: if a song has a solo singer (Line 1 code 2 or 3), it must also have no tonal blend (Line 5 code 1).These dependencies meant the first two principal components effectively differentiate between songs with group or solo instrumental performances (PC1) and group or solo vocal performances (PC2).This is visualized in figure S2.There are two solutions to this -either analyze all songs excluding all but one of the dependent variables in table S4 and S5, or exclude all songs which contain solo instrumentalists or songs.We decide the former is a better solution for studying global musical diversity.Fig. S4: These plots display the first and second principal components from the Cantometrics dataset using all variables and all songs from societies with two or more songs (n = 5,474).When using all variables, the first and second principal components consist primarily of the dependencies described in table S5 and S6.We plot the principal components based on two categorical variables: Line 2: code 1 to distinguish songs with one or no instruments to songs with multiple instruments and Line 1: code 1, 2, and 3 to distinguish songs where one singer is heard at a time to songs with concurrent singers.We do not use variables that contain dependencies as a result.S11.Delta scores for three regional subsets of 50 societies: Africa, Oceania, and Europe.For three datasets: 2 songs or more, 10 songs or more, and the SCCS sample.

Number of phrases 1 -
13 (eight phrases before repeat -one or two phrases)LinePosition of final tone 1 -5 (final note is lowest -final note is highest)

Fig. S1 :
Fig. S1: A histogram showing the distribution of songs per society (N societies = 1,026).Songs are distributed unevenly between societies, with 73 songs being the most any one society has and 232 societies containing only one song.

Fig. S2 :
Fig. S2: Map of 222 Cantometrics societies (represented by 3,063 songs).These songs make up the set of societies for which we have ten songs or more.44 societies are matched to both genetic and linguistic data are indicated in red (societies without matching data are in grey).

Fig. S3 :
Fig. S3: Map of 95 Cantometrics societies (represented by 689 songs).These songs make up the set of societies which are matched to the SCCS.21 societies are matched to both genetic and linguistic data are indicated in red (societies without matching data are in grey).

Fig S7 .
Fig S7.Variogram of each musical dimension, as Musical PhiST distances, derived from the latent variable analysis for the 2 song or more dataset (N = 117 societies).Autocorrelation (r) is shown on the y-axis, with correlations measured at 500km intervals.White shapes indicate significant autocorrelation and black shapes indicate non-significant autocorrelation.Error bars show the 95% confidence intervals for each distance.

Fig S8 .Fig S9 .
Fig S8.Variogram of Genetic FST distances, phylogenetic distance from the global language phylogeny, and Musical PhiST distances for the 10 song or more dataset (N = 44 societies).Autocorrelation (r) is shown on the y-axis, with correlations measured at 500km intervals.White shapes indicate significant autocorrelation and black shapes indicate non-significant autocorrelation.Error bars show the 95% confidence intervals for each distance.

Fig S11 .
Fig S11.Pairwise plots between a PhiST matric of all cantometrics variables, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Fig S12 .
Fig S12.Pairwise plots between a PhiST matrix of Articulation, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Fig S13 .
Fig S13.Pairwise plots between a PhiST matrix of Ornamentation, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Fig S14 .
Fig S14.Pairwise plots between a PhiST matrix of Rhythm, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Fig S15 .
Fig S15.Pairwise plots between a PhiST matrix of Dynamics, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Fig S16 .
Fig S16.Pairwise plots between a PhiST matrix of Tension, and genetic, linguistic, and spatial distance.Linear regression line shown in red with Pearson's R value in the top right.Plots show distances between pairs of 117 societies (N = 6,786).

Table S2 .
List of Lines where high values indicate less of a musical feature.This table shows the high and low values on the original scale, which are reversed in our analyses.
Note also that mean Kappa of 0.54 is much higher than comparable cross-cultural music datasets published in high-profile journals (e.g., mean Kappa of 0.24 in Mehr et al., 2019, Science [reported inWood et al. 2022Supporting Information]; mean Kappa of 0.45 in Savage et al., 2015, PNAS Supporting Information).

Table S3 .
Latent variable model results summary

Table S4 .
Genetic, Linguistic, and Geographic results comparison result summary

Table S5 .
Equation description of the latent variable model (N songs = 5,242).Showing the description of each Cantometrics and their standardized weight, indicating their contribution to the latent variable.

Table S6 :
Table of dependencies in vocal variables.Described as the Line number, and the code for that line that the dependency occurs on after a colon.*Note: Line 1: 1 (No Singers) cannot occur in this dataset, although it does exist in the coding scheme -all songs must have singers.

Table S7 :
Table of dependencies in orchestral variables.Described as the Line number, and the code for that line that the dependency occurs on after a colon.

Table S8 :
Variable weights for Principal component and latent variable models for the 2-song dataset (N = 5,242).Principal components consider the weight of all variables on the latent dimensions.Latent variables are designed using a combination of theoretical knowledge of global music, the results of existing analyses of the dataset and the weightings in the principal component analysis.To help conceptualize the meaning of each dimension, we describe the extremes of each dimension with audio examples.All songs are available at https://theglobaljukebox.org/.Songs that score highly on Articulation contain precise enunciation of non-repeating lyrics (Example song: Sundanese Song 1 by Javanese performers; song 1562), whereas low scoring songs frequently repeat text with slurred enunciation (Mens' Chorus 2, by Dani performers; song 986).Songs with high levels of Ornamentation show lots of vocal embellishment, tremolo, or melisma (Esashi Oiwake by Hokkaido Japanese performers: song 364), whereas singers in low Ornamentation songs use steady and consistent notes (Zavan by Ouldeme performers: song 30146).Songs that score low on Rhythm have slow, irregular meters and long phrases (Caravan Bells and the Song of the teamsters by Tibetan performers, song 398, C in figure1in the main text), whereas songs that score high on Rhythm have a fast tempo, and regular meter, and short phrases, as exemplified by the Mbuti song Alima (song 9260, D in figure1).Songs that score high on Dynamics are loud and intense (Dance with Long Horns by Khattak performers, song 749), whereas scoring low indicates a soft song, such as Efalachid Gelat, a lullaby-love song from the Ulithi Atoll (song 2628).Songs with high Tension have singers that use very nasal, raspy, and constrained voices (Song with a Xylophone, by Burmese performers; song 2559, A in figure1).On the other end of the spectrum is low Tension songs, where singing sounds more relaxed and 'open', as in the Mbendjele song Djokobo (song 30063, B in figure1). Articulation:

Table S9 :
Pearson's Correlations between latent variables built using a dataset containing a minimum of two songs per society (5,242 songs and 719 societies), and ten songs per society (3,039 songs and 220 societies), and the Standard Cross-Cultural Sample (724 songs and 110 societies).All variables show significant and strong correlations.The N value for each correlation is the minimum N of the two datasets being compared.

Table S10 :
Pearson's Correlations between latent variables built using the full latent variable model and only variables with high reliability (N songs = 5,242).High reliability is determined by having a Cohen's Kappa value greater than .4,which some have proposed as a minimum acceptable level of reliability (e.g., in clinical contexts; Sim & Wright, 2005).The removed variables are Line 17, 24, 28, 30, 33, 34.Line 31 shows low reliability, but removing this variable meant the model did not converge, so it remains.Comparisons were performed using the two-song per society dataset.In two instances, Rhythm and Tension, this left only one variable and so the latent variable was compared to that remaining variable.Excluding Tension, all variables correlate highly regardless of whether using the full or high inter-rater reliability model.Results involving Tension should be interpreted cautiously.

Table S11 .
Delta scores for three regional subsets of 50 societies: Africa, Oceania, and Europe.For three datasets: 2 songs or more, 10 songs or more, and the SCCS sample.
Fig S10.Visualization of table

Table S12 .
Pearson Correlation of Partial RDA R2 results across the three datasets.The 2 songs or more dataset contains 5,242 songs and 719 societies, the 10 songs or more dataset contains 3,039 songs and 220 societies, and the SCCS dataset contains 724 songs and 110 societies.The N value for each comparison is the minimum N of the two datasets being compared.